• Delegated Replies: Alleviating Network Clogging in Heterogeneous Architectures 

      Zhao, Xia; Eeckhout, Lieven; Jahre, Magnus (Peer reviewed; Journal article, 2022)
      Heterogeneous architectures with latency-sensitive CPU cores and bandwidth-intensive accelerators are attractive as they deliver high performance at favorable cost. These architectures typically have significantly more ...
    • Get Out of the Valley: Power-Efficient Address Mapping for GPUs 

      Yuxi, Liu; Zhao, Xia; Jahre, Magnus; Wang, Zhenlin; Wang, Xiaolin; Lou, Yingwei; Eeckhout, Lieven (Journal article; Peer reviewed, 2018)
      GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional ...
    • HSM: A Hybrid Slowdown Model for Multitasking GPUs 

      Zhao, Xia; Jahre, Magnus; Eeckhout, Lieven (Chapter, 2020)
      Graphics Processing Units (GPUs) are increasingly widely used in the cloud to accelerate compute-heavy tasks. However, GPU-compute applications stress the GPU architecture in different ways --- leading to suboptimal resource ...
    • NUBA: Non-Uniform Bandwidth GPUs 

      Zhao, Xia; Jahre, Magnus; Tang, Yuhua; Zhang, Guangda; Eeckhout, Lieven (Chapter, 2023)
      The parallel execution model of GPUs enables scaling to hundreds of thousands of threads, which is a key capability that many modern high-performance applications exploit. GPU vendors are hence increasing the compute and ...
    • Selective Replication in Memory-Side GPU Caches 

      Zhao, Xia; Jahre, Magnus; Eeckhout, Lieven (Chapter, 2020)
      Data-intensive applications put immense strain on the memory systems of Graphics Processing Units (GPUs). To cater to this need, GPU memory systems distribute requests across independent units to provide high bandwidth by ...